A Novel Method for Classifying Subfamilies and Sub-subfamilies of G-Protein Coupled Receptors

نویسندگان

  • Majid Beigi
  • Andreas Zell
چکیده

G-protein coupled receptors (GPCRs) are a large superfamily of integral membrane proteins that transduce signals across the cell membrane. Because of that important property and other physiological roles undertaken by the GPCR family, they have been an important target of therapeutic drugs. The function of many GPCRs is not known and accurate classification of GPCRs can help us to predict their function. In this study we suggest a kernel based method to classify them at the subfamily and sub-subfamily level. To enhance the accuracy and sensitivity of classifiers at the sub-subfamily level that we were facing with a low number of sequences (imbalanced data), we used our new synthetic protein sequence oversampling (SPSO) algorithm and could gain an overall accuracy and Matthew’s correlation coefficient (MCC) of 98.4 % and 0.98 for class A, nearly 100% and 1 for class B and 96.95% and 0.91 for class C, respectively, at the subfamily level and overall accuracy and MCC of 97.93% and 0.95 at the sub-subfamily level. The results shows that Our oversampling technique can be used for other applications of protein classification with the problem of imbalanced data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast fourier transform-based support vector machine for prediction of G-protein coupled receptor subfamilies.

Although the sequence information on G-protein coupled receptors (GPCRs) continues to grow, many GPCRs remain orphaned (i.e. ligand specificity unknown) or poorly characterized with little structural information available, so an automated and reliable method is badly needed to facilitate the identification of novel receptors. In this study, a method of fast Fourier transform-based support vecto...

متن کامل

HLA and HIV Infection Progression: Application of the Minimum Description Length Principle to Statistical Genetics

Chair: V. Maojo HLA and HIV Infection Progression: Application of the Minimum Description Length Principle to Statistical Genetics Peter T. Hraber, Bette T. Korber, Steven Wolinsky, Henry A. Erlich, Elizabeth A. Trachtenberg, and Thomas B. Kepler Visualization of functional aspects of microRNA regulatory networks using the Gene Ontology Alkiviadis Symeonidis, Ioannis G. Tollis, and Martin Reczk...

متن کامل

A review of the role of dopamine receptors and novel therapeutic strategies in non-small cell lung cancer (NSCLC)

Lung cancer is a very aggressive and most deadly cancer in both men and women. Lung cancer is divided into two types of small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC). NSCLC is divided into 3 subgroups: adenocarcinoma (AC), squamous cell carcinoma (SqCC) and large cell carcinoma (LCC). Dopamine is involved in controlling motions, cognition, emotions, memory and reward mech...

متن کامل

Clustering Protein Sequence and Structure Space with Infinite Gaussian Mixture Models

We describe a novel approach to the problem of automatically clustering protein sequences and discovering protein families, subfamilies etc., based on the theory of infinite Gaussian mixtures models. This method allows the data itself to dictate how many mixture components are required to model it, and provides a measure of the probability that two proteins belong to the same cluster. We illust...

متن کامل

GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors

G-protein coupled receptors (GPCRs) belong to one of the largest superfamilies of membrane proteins and are important targets for drug design. In this study, a support vector machine (SVM)-based method, GPCRpred, has been developed for predicting families and subfamilies of GPCRs from the dipeptide composition of proteins. The dataset used in this study for training and testing was obtained fro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006